509 research outputs found
Sustainable growth in complex networks
Based on the empirical analysis of the dependency network in 18 Java
projects, we develop a novel model of network growth which considers both: an
attachment mechanism and the addition of new nodes with a heterogeneous
distribution of their initial degree, . Empirically we find that the
cumulative degree distributions of initial degrees and of the final network,
follow power-law behaviors: , and
, respectively. For the total number of links as a
function of the network size, we find empirically ,
where is (at the beginning of the network evolution) between 1.25 and
2, while converging to for large . This indicates a transition from
a growth regime with increasing network density towards a sustainable regime,
which revents a collapse because of ever increasing dependencies. Our
theoretical framework is able to predict relations between the exponents
, , , which also link issues of software engineering and
developer activity. These relations are verified by means of computer
simulations and empirical investigations. They indicate that the growth of real
Open Source Software networks occurs on the edge between two regimes, which are
either dominated by the initial degree distribution of added nodes, or by the
preferential attachment mechanism. Hence, the heterogeneous degree distribution
of newly added nodes, found empirically, is essential to describe the laws of
sustainable growth in networks.Comment: 5 pages, 2 figures, 1 tabl
{VoG}: {Summarizing} and Understanding Large Graphs
How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the "importance" of a set of discovered subgraphs in a large graph? These are exactly the problems we focus on. Our main ideas are to construct a "vocabulary" of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary. We measure success in a well-founded way by means of the Minimum Description Length (MDL) principle: a subgraph is included in the summary if it decreases the total description length of the graph. Our contributions are three-fold: (a) formulation: we provide a principled encoding scheme to choose vocabulary subgraphs; (b) algorithm: we develop \method, an efficient method to minimize the description cost, and (c) applicability: we report experimental results on multi-million-edge real graphs, including Flickr and the Notre Dame web graph
Kronecker Graphs: An Approach to Modeling Networks
How can we model networks with a mathematically tractable model that allows
for rigorous analysis of network properties? Networks exhibit a long list of
surprising properties: heavy tails for the degree distribution; small
diameters; and densification and shrinking diameters over time. Most present
network models either fail to match several of the above properties, are
complicated to analyze mathematically, or both. In this paper we propose a
generative model for networks that is both mathematically tractable and can
generate networks that have the above mentioned properties. Our main idea is to
use the Kronecker product to generate graphs that we refer to as "Kronecker
graphs".
First, we prove that Kronecker graphs naturally obey common network
properties. We also provide empirical evidence showing that Kronecker graphs
can effectively model the structure of real networks.
We then present KronFit, a fast and scalable algorithm for fitting the
Kronecker graph generation model to large real networks. A naive approach to
fitting would take super- exponential time. In contrast, KronFit takes linear
time, by exploiting the structure of Kronecker matrix multiplication and by
using statistical simulation techniques.
Experiments on large real and synthetic networks show that KronFit finds
accurate parameters that indeed very well mimic the properties of target
networks. Once fitted, the model parameters can be used to gain insights about
the network structure, and the resulting synthetic graphs can be used for null-
models, anonymization, extrapolations, and graph summarization
A dissemination strategy for immunizing scale-free networks
We consider the problem of distributing a vaccine for immunizing a scale-free
network against a given virus or worm. We introduce a new method, based on
vaccine dissemination, that seems to reflect more accurately what is expected
to occur in real-world networks. Also, since the dissemination is performed
using only local information, the method can be easily employed in practice.
Using a random-graph framework, we analyze our method both mathematically and
by means of simulations. We demonstrate its efficacy regarding the trade-off
between the expected number of nodes that receive the vaccine and the network's
resulting vulnerability to develop an epidemic as the virus or worm attempts to
infect one of its nodes. For some scenarios, the new method is seen to render
the network practically invulnerable to attacks while requiring only a small
fraction of the nodes to receive the vaccine
Power-hop: A pervasive observation for real complex networks
Complex networks have been shown to exhibit universal properties, with one of the most consistent patterns being the scale-free degree distribution, but are there regularities obeyed by the r-hop neighborhood in real networks? We answer this question by identifying another power-law pattern that describes the relationship between the fractions of node pairs C(r) within r hops and the hop count r. This scale-free distribution is pervasive and describes a large variety of networks, ranging from social and urban to technological and biological networks. In particular, inspired by the definition of the fractal correlation dimension D2 on a point-set, we consider the hop-count rto be the underlying distance metric between two vertices of the network, and we examine the scaling of C(r) with r. We find that this relationship follows a power-law in real networks within the range 2<r<d, where d is the effective diameter of the network, that is, the 90-th percentile distance. We term this relationship as power-hop and the corresponding power-law exponent as power-hop exponent h. We provide theoretical justification for this pattern under successful existing network models, while we analyze a large set of real and synthetic network datasets and we show the pervasiveness of the power-hop
Generic Subsequence Matching Framework: Modularity, Flexibility, Efficiency
Subsequence matching has appeared to be an ideal approach for solving many
problems related to the fields of data mining and similarity retrieval. It has
been shown that almost any data class (audio, image, biometrics, signals) is or
can be represented by some kind of time series or string of symbols, which can
be seen as an input for various subsequence matching approaches. The variety of
data types, specific tasks and their partial or full solutions is so wide that
the choice, implementation and parametrization of a suitable solution for a
given task might be complicated and time-consuming; a possibly fruitful
combination of fragments from different research areas may not be obvious nor
easy to realize. The leading authors of this field also mention the
implementation bias that makes difficult a proper comparison of competing
approaches. Therefore we present a new generic Subsequence Matching Framework
(SMF) that tries to overcome the aforementioned problems by a uniform frame
that simplifies and speeds up the design, development and evaluation of
subsequence matching related systems. We identify several relatively separate
subtasks solved differently over the literature and SMF enables to combine them
in straightforward manner achieving new quality and efficiency. This framework
can be used in many application domains and its components can be reused
effectively. Its strictly modular architecture and openness enables also
involvement of efficient solutions from different fields, for instance
efficient metric-based indexes. This is an extended version of a paper
published on DEXA 2012.Comment: This is an extended version of a paper published on DEXA 201
Behavior of susceptible-infected-susceptible epidemics on heterogeneous networks with saturation
We investigate saturation effects in susceptible-infected-susceptible (SIS)
models of the spread of epidemics in heterogeneous populations. The structure
of interactions in the population is represented by networks with connectivity
distribution ,including scale-free(SF) networks with power law
distributions . Considering cases where the transmission
of infection between nodes depends on their connectivity, we introduce a
saturation function which reduces the infection transmission rate
across an edge going from a node with high connectivity . A mean
field approximation with the neglect of degree-degree correlation then leads to
a finite threshold for SF networks with . We
also find, in this approximation, the fraction of infected individuals among
those with degree for close to . We investigate via
computer simulation the contact process on a heterogeneous regular lattice and
compare the results with those obtained from mean field theory with and without
neglect of degree-degree correlations.Comment: 6 figure
From Cooperative Scans to Predictive Buffer Management
In analytical applications, database systems often need to sustain workloads
with multiple concurrent scans hitting the same table. The Cooperative Scans
(CScans) framework, which introduces an Active Buffer Manager (ABM) component
into the database architecture, has been the most effective and elaborate
response to this problem, and was initially developed in the X100 research
prototype. We now report on the the experiences of integrating Cooperative
Scans into its industrial-strength successor, the Vectorwise database product.
During this implementation we invented a simpler optimization of concurrent
scan buffer management, called Predictive Buffer Management (PBM). PBM is based
on the observation that in a workload with long-running scans, the buffer
manager has quite a bit of information on the workload in the immediate future,
such that an approximation of the ideal OPT algorithm becomes feasible. In the
evaluation on both synthetic benchmarks as well as a TPC-H throughput run we
compare the benefits of naive buffer management (LRU) versus CScans, PBM and
OPT; showing that PBM achieves benefits close to Cooperative Scans, while
incurring much lower architectural impact.Comment: VLDB201
Postopek pridobitve vstopnega vizuma za državljane Bosne in Hercegovine
With the rise of online social networks and smartphones that record the user's location, a new type of online social network has gained popularity during the last few years, the so called Location-based Social Networks (LBSNs). In such networks, users voluntarily share their location with their friends via a check-in. In exchange they get recommendations tailored to their particular location as well as special deals that businesses offer when users check-in frequently. LBSNs started as specialized platforms such as Gowalla and Foursquare, however their immense popularity has led online social networking giants like Facebook to adopt this functionality. The spatial aspect of LBSNs directly ties the physical with the online world, creating a very rich ecosystem where users interact with their friends both online as well as declare their physical (co-)presence in various locations. Such a rich environment calls for novel analytic tools that can model the aforementioned types of interactions. In this work, we propose to model and analyze LBSNs using Tensors and Tensor Decompositions, powerful analytical tools that have enjoyed great growth and success in fields like Machine Learning, Data Mining, and Signal Processing alike. By doing so, we identify tightly knit, hidden communities of users and locations which they frequent. In addition to Tensor Decompositions, we use Signal Processing tools that have been previously used in Direction of Arrival (DOA) estimations, in order to study the temporal dynamics of hidden communities in LBSNs
Search in Complex Networks : a New Method of Naming
We suggest a method for routing when the source does not posses full
information about the shortest path to the destination. The method is
particularly useful for scale-free networks, and exploits its unique
characteristics. By assigning new (short) names to nodes (aka labelling) we are
able to reduce significantly the memory requirement at the routers, yet we
succeed in routing with high probability through paths very close in distance
to the shortest ones.Comment: 5 pages, 4 figure
- …